Skip to content

Conversation

@Allda
Copy link
Collaborator

@Allda Allda commented Oct 17, 2025

A new backup controller orchestrates a backup process for workspace PVC. A new configuration option is added to DevWorkspaceOperatorConfig that enables running regular cronjob that is responsible for backup mechanism. The job executes following steps:

  • Find a workspaces
  • Finds out that workspace has been recently stopped
  • Detect a workspace PVC
  • Execute a job in the same namespace that does the backup

The last step is currently not fully implemented as it requires running a buildah inside the container and it will be delivered as a separate feature.

Issue: eclipse-che/che#23570

What does this PR do?

What issues does this PR fix or reference?

Is it tested? How?

The feature has been tested locally and using integration tests. Following configuration should be added to the config to enable this feature:

config:                                                                         
  workspace:                                                                    
    backupCronJob:                                                              
      enable: true                                                              
      registry: kind-registry:5000/backup                                       
      schedule: '* * * * *'

After a config is added, stop any workspace and wait till a backup job is created.

$ kubectl get jobs
devworkspace-backup-2l679   Running    0/1           138m       138m
devworkspace-backup-2xvgl   Running    0/1           139m       139m
devworkspace-backup-45vxb   Running    0/1           145m       145m

The job creates a backup and push image to registry

+ set -e
+ exec /workspace-recovery.sh --backup
+ set -e
+ for i in "$@"
+ case $i in
+ backup
+ BACKUP_IMAGE=kind-registry:5000/backup/backup-default-common-pvc-test:latest
++ buildah from scratch
+ NEW_IMAGE=working-container
+ buildah copy working-container /workspace/workspacedfd9f53065ea452c//projects /
f099c09f924cf051a01d78cd34ca87a4c161d7c217df5ac627e90e66926fbe9f
+ buildah config --label DEVWORKSPACE=common-pvc-test working-container
+ buildah config --label NAMESPACE=default working-container
+ buildah commit working-container kind-registry:5000/backup/backup-default-common-pvc-test:latest
Getting image source signatures
Copying blob sha256:137b2a0909654325b7eff0a9dfe623e5abdc685c4d6ad8e4c8d163e0984cf805
Copying config sha256:86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
Writing manifest to image destination
86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
+ buildah umount working-container
+ buildah push --tls-verify=false kind-registry:5000/backup/backup-default-common-pvc-test:latest
Getting image source signatures
Copying blob sha256:137b2a0909654325b7eff0a9dfe623e5abdc685c4d6ad8e4c8d163e0984cf805
Copying config sha256:86693ca728855121a4dce059d91c6c9a196b4611fea4cb17d7b38015310cf193
Writing manifest to image destination
stream closed: EOF for default/devworkspace-backup-zjzk5-82psq (backup-workspace)

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

@openshift-ci
Copy link

openshift-ci bot commented Oct 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Allda
Once this PR has been reviewed and has the lgtm label, please assign dkwon17 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Allda Allda force-pushed the 23570 branch 2 times, most recently from 42dd45c to dffd7e6 Compare October 17, 2025 11:06
@rohanKanojia
Copy link
Member

@Allda : Really appreciate you taking the time to contribute this in such a short time. 🎉

Could you please also fill out the “Is it tested? How?” section in the PR template? It’ll help reviewers and future contributors verify the change more easily.

Thanks again for your effort! 🙌

@rohanKanojia
Copy link
Member

I tested this PR and it seems to work.

  1. Created DevWorkspaceOperatorConfig with this BackupCronJobConfig (backup every 3 minutes)
config:
  workspace:
    backupCronJob:
      enable: true
      schedule: "*/3 * * * *"
  1. Created a DevWorkspace and wait for it to get running
  2. Stopped workspace
  3. Controller detected stopped workspace and started creating jobs for backups:
NAME               STATUS    COMPLETIONS   DURATION   AGE
backup-job-8tnsp   Running   0/1                      0s
backup-job-8tnsp   Running   0/1           0s         0s
backup-job-8tnsp   Running   0/1           16s        16s
backup-job-8tnsp   Running   0/1           17s        17s
backup-job-8tnsp   Running   0/1           18s        18s
backup-job-8tnsp   Complete   1/1           18s        18s
backup-job-kc8rm   Running    0/1                      0s
backup-job-kc8rm   Running    0/1           0s         0s
backup-job-kc8rm   Running    0/1           6s         6s
backup-job-kc8rm   Running    0/1           7s         7s
backup-job-kc8rm   Running    0/1           8s         8s
backup-job-kc8rm   Complete   1/1           8s         8s

@Allda Allda force-pushed the 23570 branch 3 times, most recently from 0bc74b1 to 8427ba5 Compare October 29, 2025 10:24
@Allda
Copy link
Collaborator Author

Allda commented Oct 29, 2025

/retest

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

❌ Patch coverage is 64.13043% with 165 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.30%. Comparing base (d92e750) to head (2679783).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
...trollers/backupcronjob/backupcronjob_controller.go 71.95% 87 Missing and 19 partials ⚠️
apis/controller/v1alpha1/zz_generated.deepcopy.go 0.00% 43 Missing ⚠️
main.go 0.00% 9 Missing ⚠️
internal/images/image.go 0.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1530      +/-   ##
==========================================
+ Coverage   34.09%   35.30%   +1.21%     
==========================================
  Files         160      161       +1     
  Lines       13348    13802     +454     
==========================================
+ Hits         4551     4873     +322     
- Misses       8487     8599     +112     
- Partials      310      330      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ibuziuk ibuziuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Allda great job!
discussed the overall PR with @dkwon17 and I believe we should target it to be merged in the DWO 0.39.0 version

A backup job use a PVC name from a default value or from the config if
user configured custom name.

Signed-off-by: Ales Raszka <[email protected]>
The backup job can now push to registries which requires auth token. The
token is provided as a secret in operator namespace and added to the
operator config.

Signed-off-by: Ales Raszka <[email protected]>
A backup job now determines the name of pvc based on used storage type.
It distinguish between different storage types (common and per-workspace) and
mount the volume dynamically.

Signed-off-by: Ales Raszka <[email protected]>
It turns out the capabilities from the prototype are not needed.

Signed-off-by: Ales Raszka <[email protected]>
A new SA is created for the backup jobs to limit the permission to just
what is necessary.

Signed-off-by: Ales Raszka <[email protected]>
- Make registry field required
- Replace custom bool comparison with library
- Minor tweeks

Signed-off-by: Ales Raszka <[email protected]>
Use single logger across the controller and only add context if needed.

Signed-off-by: Ales Raszka <[email protected]>
@dkwon17
Copy link
Collaborator

dkwon17 commented Nov 14, 2025

@Allda maybe I'm missing something but I am getting this error:

LAST SEEN   TYPE      REASON                  OBJECT                                                    MESSAGE
25s         Warning   FailedCreate            job/devworkspace-backup-29mb2                             Error creating: pods "devworkspace-backup-29mb2-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{0}: 0 is not an allowed group, provider restricted-v2: .containers[0].runAsUser: Invalid value: 0: must be in the ranges: [1000870000, 1000879999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "container-build": Forbidden: not usable by user or serviceaccount, provider "user-namespace": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid-v2": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "hostpath-provisioner-csi": Forbidden: not usable by user or serviceaccount, provider "insights-runtime-extractor-scc": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

This is my DWOC:

apiVersion: controller.devfile.io/v1alpha1
config:
  workspace:
    backupCronJob:
      enable: true
      registry: quay.io/dkwon17/test
      registryAuthSecret: quay-credentials
      schedule: '* * * * *'
kind: DevWorkspaceOperatorConfig
metadata:
  name: devworkspace-operator-config
  namespace: openshift-operators

Any ideas?

@Allda
Copy link
Collaborator Author

Allda commented Nov 18, 2025

@Allda maybe I'm missing something but I am getting this error:

LAST SEEN   TYPE      REASON                  OBJECT                                                    MESSAGE
25s         Warning   FailedCreate            job/devworkspace-backup-29mb2                             Error creating: pods "devworkspace-backup-29mb2-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{0}: 0 is not an allowed group, provider restricted-v2: .containers[0].runAsUser: Invalid value: 0: must be in the ranges: [1000870000, 1000879999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "container-build": Forbidden: not usable by user or serviceaccount, provider "user-namespace": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid-v2": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "hostpath-provisioner-csi": Forbidden: not usable by user or serviceaccount, provider "insights-runtime-extractor-scc": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

This is my DWOC:

apiVersion: controller.devfile.io/v1alpha1
config:
  workspace:
    backupCronJob:
      enable: true
      registry: quay.io/dkwon17/test
      registryAuthSecret: quay-credentials
      schedule: '* * * * *'
kind: DevWorkspaceOperatorConfig
metadata:
  name: devworkspace-operator-config
  namespace: openshift-operators

Any ideas?

Could you please share more details about the created Job? Are there any events or logs? I tried it today with a similar config (only difference is registry location) and the push Job passed.

@dkwon17
Copy link
Collaborator

dkwon17 commented Nov 18, 2025

Here is the job's yaml: job-devworkspace-backup-29mb2.yaml

There weren't any pod logs (since the pod never started) but only a lot of the Error creating: pods "devworkspace-backup-29mb2-" is forbidden: unable to validate against any security context constraint: ... events:

Screen.Recording.2025-11-18.at.3.59.33.PM.mov

Output of oc get events -n admin-devspaces: events.log

A registry configuration is now stored under a separated nested struct.

Signed-off-by: Ales Raszka <[email protected]>
A SA is created for every backup workspace to avoid ownership conflict.

Signed-off-by: Ales Raszka <[email protected]>
A switching a based image to podman allowed us to run a backup job as a
regular user 1000 without any privileged escalation.

Signed-off-by: Ales Raszka <[email protected]>
@rohanKanojia
Copy link
Member

rohanKanojia commented Nov 20, 2025

@Allda :

Edit: I was able to resolve this issue by granting additional permissions to ServiceAccount , however I'm not sure whether this is required step or some issue:

oc adm policy add-scc-to-user anyuid -z devworkspace-job-runner-workspacea5357a3c22ce497a

After updating ServiceAccount permissions, I'm able to see backup job getting executed and creating images on the configured registry:

Screenshot 2025-11-20 at 20 20 56

Question: Will the backup image will be platform dependent? I see only linux/amd64 arch being pushed for it. My cluster was also linux/amd64.


I'm also facing the same issue as David. I was trying to test your changes on CRC.

Environment:

OS: Linux

CRC Version:

CRC version: 2.53.0+a6f712
OpenShift version: 4.19.3
MicroShift version: 4.19.0

Steps to Reproduce:

  1. Install DWO based on your PR changes
  2. Edit DevWorkspaceOperatorConfig to add backup config
  config:
    workspace:
      backupCronJob:
        enable: true
        registry:
          authSecret: dockerhub-push-secret
          path: docker.io/rohankanojia
        schedule: '* * * * *'
  1. Create Registry Auth secret in openshift-operators namespace
  2. Create DevWorkspace
# run From DWO root dir
oc create -f samples/code-latest.yaml
  1. Stop DevWorkspace
oc patch devworkspace code-latest \
        --type=merge \
        -p '{"spec": {"started": false}}'
  1. I was able to see job getting created, however it never got ready
oc get jobs -w                                                                                                          ─╯
NAME                        STATUS    COMPLETIONS   DURATION   AGE
devworkspace-backup-ntzf7   Running   0/1           34s        34s

Upon checking details I see this error:

oc describe job                                                                                                         ─╯
Name:             devworkspace-backup-ntzf7
Namespace:        rokumar-dev
Selector:         batch.kubernetes.io/controller-uid=14a22818-4416-4f52-b387-2cc0dd2e8df9
Labels:           controller.devfile.io/backup-job=true
                  controller.devfile.io/devworkspace_id=workspace388fc6409bc642e7
Annotations:      <none>
Controlled By:    DevWorkspace/code-latest
Parallelism:      1
Completions:      1
Completion Mode:  NonIndexed
Suspend:          false
Backoff Limit:    6
Start Time:       Thu, 20 Nov 2025 17:35:00 +0530
Pods Statuses:    0 Active (0 Ready) / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           batch.kubernetes.io/controller-uid=14a22818-4416-4f52-b387-2cc0dd2e8df9
                    batch.kubernetes.io/job-name=devworkspace-backup-ntzf7
                    controller-uid=14a22818-4416-4f52-b387-2cc0dd2e8df9
                    job-name=devworkspace-backup-ntzf7
  Annotations:      io.kubernetes.cri-o.Devices: /dev/fuse
  Service Account:  devworkspace-job-runner-workspace388fc6409bc642e7
  Containers:
   backup-workspace:
    Image:      quay.io/devfile/project-backup:next
    Port:       <none>
    Host Port:  <none>
    Args:
      /workspace-recovery.sh
      --backup
    Environment:
      DEVWORKSPACE_NAME:             code-latest
      DEVWORKSPACE_NAMESPACE:        rokumar-dev
      WORKSPACE_ID:                  workspace388fc6409bc642e7
      BACKUP_SOURCE_PATH:            /workspace/workspace388fc6409bc642e7/projects
      DEVWORKSPACE_BACKUP_REGISTRY:  docker.io/rohankanojia
      PODMAN_PUSH_OPTIONS:           --tls-verify=false
      REGISTRY_AUTH_FILE:            /home/podman/.docker/.dockerconfigjson
    Mounts:
      /home/podman/.docker from registry-auth-secret (ro)
      /var/lib/containers from build-storage (rw)
      /workspace from workspace-data (rw)
  Volumes:
   workspace-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  claim-devworkspace
    ReadOnly:   false
   build-storage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
   registry-auth-secret:
    Type:          Secret (a volume populated by a Secret)
    SecretName:    devworkspace-backup-registry-auth
    Optional:      false
  Node-Selectors:  <none>
  Tolerations:     <none>
Events:
  Type     Reason        Age               From            Message
  ----     ------        ----              ----            -------
  Warning  FailedCreate  8s (x6 over 39s)  job-controller  Error creating: pods "devworkspace-backup-ntzf7-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .containers[0].runAsUser: Invalid value: 1000: must be in the ranges: [1000660000, 1000669999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid-v2": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "hostpath-provisioner": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

Perhaps pod creation is getting forbidden by OpenShift SecurityContextConstraints due to this:

							SecurityContext: &corev1.SecurityContext{
								RunAsUser: ptr.To[int64](1000),
							},

@Allda
Copy link
Collaborator Author

Allda commented Nov 25, 2025

@rohanKanojia @dkwon17 I managed to find an alternative that doesn't violate any OCP-specific security constraints. The podman was replaced with oras to store backup artifacts into the registry. This simplifies lot of things as it doesn't require podman-specific config and running podman inside a container. Please give it try again - don't forget to rebuild the backup image first.

I tested it locally on Kind and remotely on OCP 4.20 cluster.

@rohanKanojia
Copy link
Member

rohanKanojia commented Nov 25, 2025

@Allda : Thank you! I can confirm that this approach works without any explicit configuration.

I can see the backup image being pushed to the configured registry. However, I see image has a different format application/vnd.oci.empty.v1+json . I guess this is due to using oras , right?

I tested on CRC with OpenShift 4.20.1


// ExtraArgs are additional arguments passed to the oras CLI
// +kubebuilder:validation:Optional
ExtraArgs string `json:"extraArgs,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intended use case for passing arbitrary oras CLI flags? Should we allow users to inject raw CLI arguments, given that this might expose the underlying backup implementation?

Path string `json:"path,omitempty"`
// AuthSecret is the name of a Kubernetes secret of
// type kubernetes.io/dockerconfigjson
// +kubebuilder:validation:Optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would help clarify that the AuthSecret must reside in the same namespace as the DevWorkspaceOperatorConfig?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants